Back

Journal of Personalized Medicine

MDPI AG

Preprints posted in the last 7 days, ranked by how well they match Journal of Personalized Medicine's content profile, based on 28 papers previously published here. The average preprint has a 0.04% match score for this journal, so anything above that is already an above-average fit.

1
A bibliometric review of explainable AI in diabetes risk prediction: Trends, gaps, and knowledge graph opportunities

Van, T. A.

2026-04-20 health informatics 10.64898/2026.04.16.26351069 medRxiv
Top 0.4%
1.6%
Show abstract

BackgroundType 2 diabetes mellitus (T2DM) is a leading global public health challenge. Machine learning (ML) combined with Explainable AI (XAI) is increasingly applied to T2DM risk prediction, but the field lacks a quantitative overview of methodological trends and integration gaps. MethodsWe present a structured synthesis and critical analysis of the XAI literature on T2DM risk prediction, combining (i) quantitative bibliometric analysis of a two-database corpus (N = 2,048 documents from Scopus and PubMed/MEDLINE, deduplicated via a transparent three-tier pipeline) and (ii) an in-depth selective review of 15 highly cited papers. Reporting follows PRISMA 2020, adapted for metadata-based synthesis; analyses include keyword frequency, rule-based thematic clustering, and publication trend analysis. ResultsThe field grew rapidly, from 36 documents (2020) to 866 (2025). SHAP and LIME dominate XAI methods; XGBoost and Random Forest dominate ML models. Critically, KG/GNN terms appeared in only 17 documents ([~]0.83%) compared with 906 for XAI methods, a 53.3:1 disparity. This gap is consistent across both databases, which share 33.2% of their records, ruling out a single-database artifact. The selective review confirmed that none of the 15 highly cited papers combined all three components, ML, XAI, and KG, in T2DM risk prediction. ConclusionsThe XAI for T2DM risk prediction field exhibits a clinical interpretability gap: statistical explanations are rarely linked to structured clinical pathways. We propose a three-layer conceptual framework (Predictive [->] Explainability [->] Knowledge) that integrates KG as a supplementary semantic layer, with potential applications in clinical decision support and population-level screening. The framework does not perform true causal inference but structures explanations around established pathophysiological knowledge. This study contributes a transferable methodology and a quantified research gap to guide future work integrating ML, XAI, and structured medical knowledge.

2
The Golden Opportunity or the Cutting Room Floor? Quantifying and Characterizing the Loss and Addition of Social Determinants of Health during Clinician Editing of Ambient AI Documentation

Kim, S.; Guo, Y.; Sutari, S.; Chow, E.; Tam, S.; Perret, D.; Pandita, D.; Zheng, K.

2026-04-22 health systems and quality improvement 10.64898/2026.04.20.26351322 medRxiv
Top 0.7%
1.0%
Show abstract

Social determinants of health (SDoH) are important for clinical care, but it remains unclear how much AI-captured social context is preserved after clinician editing in ambient documentation workflows. We retrospectively analyzed 75,133 paired ambient AI-drafted and clinician-finalized note sections from ambulatory care at a large academic health system. Using a rule-based NLP pipeline, we extracted 21 SDoH categories and quantified retention, deletion, and addition. SDoH appeared in 25.2% of AI drafts versus 17.2% of final notes. At the mention level, AI captured 29,991 SDoH mentions, of which 45.1% were deleted, 54.9% were retained with clinicians adding 3,583 new mentions. Insurance and marital status were most often deleted, whereas substance use and physical activity were more often retained. Deletion patterns also varied by specialty, supporting the need for specialty-aware ambient AI systems.

3
A profile analysis of peripherally inserted central catheters implanted over 10 years in a quaternary hospital

da Luz, C. C.; Sorbello, C. C. J.; Epifanio, E. A.; dos Santos, C. d. A.; Brandi, S.; Guerra, J. C. d. C.; Wolosker, N.

2026-04-23 health systems and quality improvement 10.64898/2026.04.22.26351492 medRxiv
Top 0.7%
1.0%
Show abstract

Abstract: Background: Vascular access is essential in treating patients undergoing prolonged endovenous therapy such as chemotherapy, antibiotics, and parenteral nutrition. Since the 1990s, when PICCs (peripherally inserted central catheters) appeared, vascular access options have expanded significantly, revolutionizing the treatment landscape for all types of patients. Objective: To analyze and describe the profile of the use of PICCs in a Brazilian quaternary hospital over 10 years with data collected by the infusion therapy team. Evaluating the number of PICCs implanted over the years, patients epidemiology and clinical characteristics, insertion details, associated complications, and the reason for removal. Methods: A retrospective cohort study that employs a quantitative, non-experimental approach to classify and statistically analyze past events associated with 21,652 PICCs implanted from January 2012 to December 2021 in a quaternary hospital at Sao Paulo - Brazil. All the catheters were implanted, and the data was collected by a team of nurses specializing in infusion therapy. We analyzed the number of catheters implanted over the years, insertion characteristics, patients epidemiology and clinical data, possible associated complications, and the reason for removal. Statistical analyses were conducted using R software (version 4.4.1) and SPSS (version 29) for Windows (IBM Corp, Armonk, NY). Results: During the specified period, 21,652 catheters were analyzed. The patients gender distribution was nearly balanced (48.2% versus 51.8%), and the average age was 66 years. Cardiovascular and metabolic issues were the most common comorbidities, and between 2020 and 2021, 29.3% of the sample tested positive for COVID-19. The most common location of hospitalization and implantation was the medical-surgical clinic (31.6% - 41.4%), and the most used type of catheter was the Power Picc (83.9%). The estimated complication incidence density is 2.94 complications per 1,000 catheter-days. Almost all the PICCs (98,2%) were adequately located at the cavo-atrial junction after the first attempt, 82.2% of catheters were removed after therapy, and the median duration of catheter use was 12 days. Conclusion: PICCs are widely employed for drug infusion, with their use growing progressively due to specialized teams greater availability and training. The high efficiency of these devices with a relatively low risk of complications already observed in previous studies was reinforced by the findings of this study of more than 20,000 catheters.

4
How can AI be compatible with evidence-based medicine?: with an example of analysis of lung cancer recurrence

Usuzaki, T.; Matsunbo, E.; Inamori, R.

2026-04-25 radiology and imaging 10.64898/2026.04.17.26351114 medRxiv
Top 1%
0.8%
Show abstract

Despite the remarkable progress of artificial intelligence represented by large language models, how AI technologies can contribute to the construction of evidence in evidence-based medicine (EBM) remains an overlooked issue. Now, we need an AI that can be compatible with EBM. In the present paper, we aim to propose an example analysis that may contribute to this approach using variable Vision Transformer.

5
Loss of autism-associated gene wac alters social behavior and identifies cho-1 as a modulator of cholinergic signaling in C. elegans

Kim, D.-W.; Boonpraman, N.; Kuhn, N. C.; Sammi, S. R.

2026-04-21 neuroscience 10.64898/2026.04.17.719318 medRxiv
Top 1%
0.8%
Show abstract

WAC is an autism-associated gene involved in neurodevelopment. However, the effects of reduced WAC function on behavior and synaptic regulation in vivo remain unclear. Taking cues from the previous studies on the wac gene and the C. elegans model of ASD, we elucidated the effects of wac gene deletion on food-leaving behavior, a known parameter linked to ASD associated genes along with the cholinergic pathway. wac-deficient worms exhibited curtailed food-leaving behavior. Notably, observed phenotype was similar to that exhibited by nematodes with mutation in ASD related gene, neuroligin. In addition, wac-deficient worms showed impaired growth, reduced pharyngeal pumping, and lifespan. To examine potential synaptic mechanisms, we analyzed expression of genes related to cholinergic signaling across all developmental stages (L1-L4) through young adult (YA). Stage-specific transcriptional changes were observed, with increased expression of ace-1 and acr-3 at L1, acr-3 at L3, and acr-3, cha-1, lev-1, and lev-10 at L4. The transcriptomic alteration was most prominent at YA stage, exhibiting upregulation of ace-1, cha-1, cho-1, lev-1, lev-10, unc-17, unc-29, unc-38, and unc-50. To identify specific suppressor of upmodulated Ach signaling, RNAi of the upregulated genes was performed. cho-1 was identified as a specific suppressor of elevated Ach signaling. cho-1 encodes a high-affinity choline transporter responsible for choline uptake in the pre-synapse. These studies identify the molecular mechanisms pertaining to up-modulation of cholinergic signaling in wac mutant worms. O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=112 SRC="FIGDIR/small/719318v1_ufig1.gif" ALT="Figure 1"> View larger version (24K): org.highwire.dtl.DTLVardef@1bdf8a9org.highwire.dtl.DTLVardef@1104825org.highwire.dtl.DTLVardef@1f09682org.highwire.dtl.DTLVardef@293b08_HPS_FORMAT_FIGEXP M_FIG C_FIG

6
Narcolepsy is associated with cardiovascular burden

Ollila, H. M.; Eghtedarian, R.; Haapaniemi, H.; Ramste, M.; FinnGen,

2026-04-23 epidemiology 10.64898/2026.04.22.26351468 medRxiv
Top 1%
0.8%
Show abstract

Background: Narcolepsy is a debilitating sleep disorder caused by hypocretin deficiency. Aside from its role to induce wakefulness, hypocretin is linked to modulated appetite and metabolism, often resulting in weight gain. Study objectives: We aimed to unravel the comprehensive epidemiological connection between narcolepsy and major cardiometabolic outcomes. Methods: We analyzed cardiovascular and metabolic disease distribution in the FinnGen study. Using longitudinal electronic health records, we assessed associations between narcolepsy, cardiac/metabolic markers, and prescriptions for relevant drugs. Results: Our findings demonstrate significant associations between narcolepsy and metabolic traits (OR [95% CI] = 2.65 [1.81, 3.89]) as well as stroke (OR = 2.36 [1.38, 4.04]). Narcolepsy patients exhibit a less favourable metabolic profile, including higher glucose levels (OR = 1.1143 [1.0599, 1.1715]) and dyslipidaemia. This is supported by increased prescriptions of insulin (OR = 2.269 [1.46, 3.53]), simvastatin (OR = 2.292 [1.59, 3.31]), and metformin (OR = 2.327 [1.66, 3.25]), reflecting high metabolic disturbances. Furthermore, positive associations with antihypertensive and antiplatelet medications were observed, consistent with elevated cardiovascular risk. Conclusion: Taken together, our findings highlight the cardiometabolic burden in narcolepsy. This study enhances understanding of the metabolic and cardiovascular consequences of narcolepsy and offers timely guidance for effective disease control.

7
Addition of Bupropion or Varenicline to Nicotine Replacement Therapy After Acute Coronary Syndrome: A Propensity-Matched Real-World Analysis

Qadeer, A.; Gohar, N.; Maniyar, P.; Shafi, N.; Juarez, L. M.; Mortada, I.; Pack, Q. R.; Jneid, H.; Gaalema, D. E.

2026-04-23 cardiovascular medicine 10.64898/2026.04.21.26351432 medRxiv
Top 1%
0.7%
Show abstract

Introduction: Smoking cessation after acute coronary syndrome (ACS) is a Class I recommendation, yet prescription pharmacotherapy use remains low and its real-world cardiovascular effectiveness when added to nicotine replacement therapy (NRT) is poorly characterized. Methods: We conducted a retrospective cohort study using the TriNetX US Collaborative Network (67 healthcare organizations). Adults hospitalized with ACS who received NRT within one month, serving as a proxy for active smoking status, were identified. Two co-primary propensity-matched (1:1, 50 covariates, caliper 0.10 SD) comparisons evaluated bupropion + NRT and varenicline + NRT individually versus NRT alone; a supportive analysis evaluated combined pharmacotherapy versus NRT alone. All-cause mortality was the primary endpoint. Secondary outcomes included MACE, heart failure exacerbations, major bleeding, TIA/stroke, emergency rehospitalizations, and cardiac rehabilitation utilization, assessed at 6 months and 1 year via Kaplan-Meier analysis. Hazard ratios (HRs) greater than 1.0 indicate higher hazard in the NRT-only group. Results: After matching, the combined analysis comprised 8,574 pairs, the bupropion analysis 4,654 pairs, and the varenicline analysis 2,126 pairs. At 1 year, the combined pharmacotherapy group had significantly lower all-cause mortality (HR 1.26, 95% CI 1.16-1.37), MACE (HR 1.16, 95% CI 1.12-1.21), heart failure exacerbations (HR 1.16, 95% CI 1.08-1.25), major bleeding (HR 1.18, 95% CI 1.08-1.28), and greater cardiac rehabilitation utilization (HR 0.82, 95% CI 0.74-0.92; all p < 0.001). TIA/stroke did not differ significantly. Six-month results were consistent. Both varenicline and bupropion individually showed lower mortality and MACE. A urinary tract infection falsification endpoint showed no between-group differences, supporting matching validity. The pharmacotherapy group had higher rates of new-onset depression, driven predominantly by bupropion recipients. Conclusions: In this propensity-matched real-world analysis, adding prescription smoking cessation pharmacotherapy to NRT after ACS was associated with lower mortality and fewer adverse cardiovascular events, supporting broader integration into post-ACS care pathways.

8
Research Paper on AuditMed: A Single-File, Browser-Based Clinical Evidence Audit Platform Architecture, Current Capabilities, and Proposed Applications in Drug Informatics and Pharmacy Education

Ferguson, D. J.

2026-04-20 health informatics 10.64898/2026.04.19.26351188 medRxiv
Top 2%
0.7%
Show abstract

BackgroundClinical pharmacists, trainees, and educators rely on multi-database literature retrieval and structured evidence synthesis to answer drug-information questions. Existing workflows require navigation across PubMed, DailyMed, LactMed, interaction checkers, and specialty guideline repositories with manual de-duplication, appraisal, and synthesis. Commercial platforms that integrate these functions are costly and often unavailable in community, rural, and international training contexts. ObjectiveThis report describes the architecture of AuditMed, a single-file, browser-based clinical evidence audit platform, and reports preliminary stress-test results against a complex multi-morbidity case corpus. AuditMed is intended for research and educational use and is not a substitute for clinical judgment or validated commercial clinical decision-support systems. MethodsAuditMed integrates nineteen free, publicly available clinical and biomedical application programming interfaces into a six-stage Search [-&gt;] Select [-&gt;] Parse [-&gt;] Analyze [-&gt;] Infer [-&gt;] Create pipeline and supports browser-local patient-case ingestion with regex-based HIPAA Safe Harbor de-identification. Preliminary stress-testing was conducted against eleven cases (Cases 30 through 40) from the Complex Clinical Case Compendium Software Validation Suite, each featuring over twenty concurrent active disease states. For each case, the one-click inference pipeline was executed with default settings and the full Clinical Inference Report was captured verbatim. No retrieval-sensitivity, synthesis-fidelity, or time-to-answer endpoints were pre-specified; the exercise was qualitative and oriented toward pipeline behavior under extreme multi-morbidity. ResultsThe pipeline completed without fatal errors for all eleven cases and produced a structured Clinical Inference Report in each instance. Quantitative-finding detection performed as designed for hematologic parameters and cardiac biomarkers. Two parser defects were identified and are reproduced in the appendix: an age-as-fever regex-precedence defect affecting seven cases and a diagnosis-versus-medication parsing defect affecting one case. Evidence-linkage rate varied from zero evidence-linked statements in seven cases to eleven in one case, reflecting dependence of the inference layer on MeSH-indexed literature coverage of the specific case diagnoses. ConclusionsAuditMed is an early-stage, open-source platform whose value at this stage is in providing a free, transparent, auditable workflow for multi-source evidence synthesis with explicit uncertainty flagging. The preliminary results document both robust end-to-end completion under extreme case complexity and specific, reproducible parser defects that will be addressed before formal evaluation. Planned evaluation studies are described.

9
Cell-type specific allelic dampening of sex-linked genes in sex chromosome aneuploidy

Filippova, G. N.; Sanger, E.; MacDonald, J.; Fang, H.; Groneck, C.; Takasaki, M.; Meleshko, A.; Ma, W.; Liu, Y.; Li, G.; ZHANG, R.; Murry, C. E.; Van Dyke, D.; Skakkebaek, A.; Gravholt, C. H.; Noble, W. S.; Bammler, T. K.; Young, J. E.; Deng, X.; Berletch, J.; Disteche, C. M.

2026-04-21 molecular biology 10.64898/2026.04.16.719032 medRxiv
Top 2%
0.6%
Show abstract

Common sex chromosome aneuploidies (SCAs) often present with cognitive and cardiovascular dysfunction in humans. To address SCA effects on gene expression and DNA methylation in relevant cell types, we differentiated neural precursor cells (NPCs) and cardiomyocytes (CMs) from human induced pluripotent stem cells (hiPSCs) with different numbers of sex chromosomes, including isogenic and independent lines. As expected, the expression of genes that escape X inactivation (escapees) mostly increases with the number of inactive X chromosomes (Xi). However, allelic analysis shows dampening of escapees specifically on the Xi in XXY compared to XX in both NPCs and CMs, revealing a novel type of dosage compensation in SCA. In contrast, Y-linked gene expression is higher in XXY versus XY NPCs, but the opposite is observed in CMs. This may explain the greater number of differentially expressed autosomal genes in NPCs versus CMs with an added Y chromosome, while effects of added X chromosomes are similar between cell types. Concordantly, changes in autosomal DNA methylation are mainly driven by the presence of a Y chromosome. These findings highlight the cell-type specificity of sex-linked and autosomal gene regulation in SCA conditions. HighlightsO_LISex chromosome aneuploidy induces cell-type specific changes in gene expression C_LIO_LIDampening of the inactive X chromosome in XXY alleviate X overexpression C_LIO_LIHigh Y-linked gene expression in XXY neuronal precursor cells but not cardiomyocytes C_LIO_LISex chromosome aneuploidy disrupts sex biases in autosomal gene expression C_LI

10
Wearable Dual-Modality Plethysmography for Arterial Modulation and Blood Pressure Dip

Jung, S.; Thomson, S.

2026-04-21 physiology 10.64898/2026.04.17.719282 medRxiv
Top 2%
0.5%
Show abstract

Continuous, non-invasive cardiovascular monitoring is limited by the superficial sensing depth of Photoplethysmography (PPG), which is susceptible to peripheral artifacts. This study evaluates a wearable dual-modality prototype integrating dryelectrode Impedance Plethysmography (IPG) and PPG within a smartwatch form factor. Results from a pilot study (N=2) demonstrate that IPG signals exhibit a temporal lead over PPG across ventral and dorsal sites, supporting its greater penetration depth. During brachial artery modulation, IPG showed superior sensitivity to arterial recovery on the ventral forearm. Furthermore, 60-minute napping sessions revealed that while PPG remained morphologically stable, IPG signals underwent significant evolution, capturing distinct pulsewave archetypes. These findings suggest that wearable IPG provides a high-fidelity window into deep systemic hemodynamics typically reserved for clinical instrumentation.

11
Identifying clinician perceived priorities for a real-time wearable system for in-hospital monitoring: findings and evolutions following the COVID-19 pandemic

Vollam, S.; Roman, C.; King, E.; Tarassenko, L.

2026-04-24 health systems and quality improvement 10.64898/2026.04.21.26350610 medRxiv
Top 3%
0.4%
Show abstract

A Wearable Monitoring System (WMS), comprising a chest patch, wrist-worn pulse oximeter, and arm-worn blood pressure device, was developed in preparation for a pilot Randomised Controlled Trial (RCT) on a UK surgical ward. The system was designed to support continuous physiological monitoring and early detection of deterioration. An initial prototype user interface was developed by the research team based on prior clinical experience and engineering knowledge. To ensure suitability for clinical practice, iterative user-centred refinement was undertaken through a series of clinician focus groups and wearability assessments. Six focus groups were conducted between November 2019 and May 2021 involving multidisciplinary healthcare professionals. Feedback from these sessions informed successive interface and system modifications. System development spanned the COVID-19 pandemic, during which the WMS was rapidly adapted and deployed to support clinical care on isolation wards. Feedback obtained during this period was incorporated into later versions of the system and provided a unique opportunity to examine changes in clinician priorities under pandemic conditions. Clinicians consistently prioritised alert visibility, alarm fatigue mitigation, parameter flexibility, and centralised monitoring. Notably, preferences regarding alert modality and access mechanisms evolved over time: early enthusiasm for mobile or smartphone-type devices shifted towards a preference for fixed, ward-based displays and audible alerts at the nurses station following pandemic deployment. Building on previous wearability testing in healthy volunteers, wearability testing using a validated questionnaire was completed by 169 patient participants during the RCT. The chest patch and pulse oximeter demonstrated high tolerability, whereas the blood pressure cuff showed poor wearability and was removed from the final system. These findings demonstrate the importance of iterative, clinician-led design for wearable WMS and highlight how extreme clinical contexts such as the COVID-19 pandemic can significantly reshape perceived requirements for safety-critical monitoring technologies.

12
Harmonising UK primary care prescription records for research: A case study in the UK Biobank

Ytsma, C. R.; Torralbo, A.; Fitzpatrick, N. K.; Pietzner, M.; Louloudis, I.; Nguyen, D.; Ansarey, S.; Denaxas, S.

2026-04-22 health informatics 10.64898/2026.04.21.26351274 medRxiv
Top 3%
0.3%
Show abstract

Objective The aim of this study was to develop and validate an automated, scalable framework to harmonise fragmented UK primary care prescription records into a research-ready dataset by mapping four diverse medical ontologies to a unified, historically comprehensive reference standard. Materials and Methods We used raw prescription records for consented participants in the UK Biobank, in which participants are uniquely characterized by multiple data modalities. Primary care data were preprocessed by selecting one drug code if multiple were recorded, cleaning codes to match reference presentations, expanding code granularity based on drug descriptions, and updating outdated codes to a single reference version. Harmonisation entailed mapping British National Formulary (BNF) and Read2 codes to dm+d, the universal NHS standard vocabulary for uniquely identifying and prescribing medicines. Harmonised dm+d records were then homogenised to a single concept granularity, the Virtual Medicinal Product (VMP). We validated our methods by creating medication profiles mapping contemporary drug prescribing patterns in 312 physical and mental health conditions. Results We preprocessed 57,659,844 records (100%) from 221,868 participants (100%). Of those, 48,950 records were dropped due to lack of drug code. 7,357,572 records (13%) used multiple ontologies. Most (76%) records were encoded in BNF and most had the code granularity expanded via the drug description (N=28,034,282; 49%). 41,244,315 records (72%) were harmonised to dm+d and 99.98% of these were converted to VMP as a homogeneous dataset. Across 312 diseases, we identified 23,352 disease-drug associations with 237 medications (represented as BNF subparagraphs) that survived statistical correction of which most resembled drug - indication pairs. Conclusion Our methodology converts highly fragmented and raw prescription records with inconsistent data quality into a streamlined, enriched dataset at a single reference, version, and granularity of information. Harmonised prescription records can be easily utilised by researchers to perform large-scale analyses in research.

13
Translation, Validation, and Application of Indonesian Genetic Literacy Questionnaires for Medical Students

Kemal, R. A.; Dhani, R.; Simanjuntak, A. M.; Rafles, A. I.; Triani, H. X.; Rahmi, T. M.; Akbar, V. A.; Firdaus, F.; Pratama, B. F.; Zulharman, Z.

2026-04-25 medical education 10.64898/2026.04.17.26350524 medRxiv
Top 3%
0.3%
Show abstract

Background: Increasing relevance of genetics and molecular biology in medicine necessitates greater genetic literacy among healthcare workers. To assess the literacy level, a validated genetic literacy questionnaire is needed. Therefore, a standardised Indonesian-language genetic literacy questionnaire is essential. Aims: We aimed to translate and validate three genetic literacy questionnaires (PUGGS, iGLAS, and UNC-GKS) for use among Indonesian medical students. We then evaluated genetic literacy levels using one of the validated questionnaires. Methods: The PUGGS, iGLAS, and UNC-GKS questionnaires were translated into Indonesian and then reviewed by an expert panel for translational accuracy and conceptual appropriateness. Back-translation was performed to confirm validity. Initial Indonesian versions of the questionnaires underwent cognitive pre-testing with 12 undergraduate medical students. After refinements, the questionnaires were validated among 34 first- to third-year medical students. The Indonesian version of UNC-GKS questionnaire was then used to assess genetic literacy of 486 medical students comprising 228 preclinical medical students, 187 clerkships, and 71 residents. Results: The Indonesian versions of PUGGS (Cronbach's = 0.819) and UNC-GKS ( = 0.809) demonstrated good reliability, while iGLAS showed poor reliability ( = 0.315). Among the 486 students tested, 56% demonstrated moderate overall genetic literacy, and only 15.2% demonstrated good overall literacy. Basic genetic concepts were relatively well-understood with 54.3% having good literacy. On the contrary, gene variant's effects on health were poorly understood with only 9.7% having good literacy. Inheritance concepts were moderately understood with 24.9% having good literacy. Conclusion: The Indonesian translations of PUGGS and UNC-GKS are reliable tools for assessing genetic literacy among medical students. Using UNC-GKS, we observed predominantly moderate genetic literacy levels. Curriculum improvement to better integrate genetics education is essential to support its clinical applications.

14
MedSafe-Dx (v0): A Safety-Focused Benchmark for Evaluating LLMs in Clinical Diagnostic Decision Support

Van Oyen, C.; Mirza-Haq, N.

2026-04-21 health informatics 10.64898/2026.04.14.26350711 medRxiv
Top 3%
0.3%
Show abstract

MedSafe-Dx (v0), introduces a new safety-focused benchmark for evaluating large language models in clinical diagnostic decision support using a filtered subset of the DDx Plus dataset (N=250). MedSafe-Dx evaluates three dimensions: escalation sensitivity, avoidance of false reassurance, and calibration of uncertainty. Models were tasked with providing a ranked differential (ICD-10), an escalation decision (Urgent vs. Routine), and a confidence flag. Performance was measured via a "Safety Pass Rate," a composite metric penalizing three hard failure modes: missed escalations of life-threatening conditions, overconfident incorrect diagnoses, and unsafe reassurance in ambiguous cases. Eleven models were evaluated and revealed a significant disconnect between diagnostic recall and safety. GPT-5.2 achieved the highest Safety Pass Rate (97.6%), while several models exhibited high rates of missed escalations or unsafe reassurance. MedSafe-Dx provides a robust stress test for identifying high-risk failure modes in diagnostic decision support and shows that high diagnostic accuracy does not guarantee clinical safety. While the benchmark is currently limited by synthetic data and proxy labels, it provides a reproducible, auditable framework for testing AI behavior before clinical deployment. Our findings suggest that interventions such as safety-focused prompting and reasoning-token budgets could be essential components for the safe deployment of LLMs in clinical workflows.

15
MIMIC-IV-Phenotype-Atlas (MIPA) : A Publicly Available Dataset for EHR Phenotyping

Yamga, E.; Goudrar, R.; Despres, P.

2026-04-24 health informatics 10.64898/2026.04.16.26350888 medRxiv
Top 3%
0.3%
Show abstract

Introduction Secondary use of electronic health records (EHRs) often requires transforming raw clinical information into research-grade data. A central step in this process is EHR phenotyping - the identification of patient cohorts defined by specific medical conditions. Although numerous approaches exist, from ICD-based heuristics to supervised learning and large language models (LLMs), the field lacks standardized benchmark datasets, limiting reproducibility and hindering fair comparison across methods. Methods We developed the MIMIC-IV Phenotype Atlas (MIPA) dataset, an adaptation of MIMIC-IV that provides expert-annotated discharge summaries across 16 phenotypes of varying prevalence and complexity. Two independent clinicians reviewed and labeled the discharge summaries, resolving disagreements by consensus. In parallel, we implemented a processing pipeline that extracts multimodal EHR features and generates training, validation, and testing datasets for supervised phenotyping. To illustrate MIPA's utility, we benchmarked four phenotyping methods : ICD-based classifiers, keyword-driven Term Frequency-Inverse Document Frequency (TF-IDF) classifiers, supervised machine learning (ML) models, and LLMs on the task. Results The final MIPA corpus consists of 1,388 expert-annotated discharge summaries. Annotation reliability was high (mean document-level kappa = 0.805, mean label-level kappa = 0.771), with 91% of disagreements resolved through consensus review. MIPA provides high-quality phenotype labels paired with structured EHR features and predefined train/validation/test splits for each phenotype. In the benchmarking case study, LLMs achieved the highest F1 scores in 13 of 16 phenotypes, particularly for conditions requiring contextual interpretation of clinical narrative, while supervised ML offered moderate improvements over rule-based baselines. Conclusion MIPA is the first publicly available benchmark dataset dedicated to EHR phenotyping, combining expert-curated annotations, broad phenotype coverage, and a reproducible processing pipeline. By enabling standardized comparison across ICD-based heuristics, ML models, and LLMs, MIPA provides a durable reference resource to advance methodological development in automated phenotyping.

16
Large language models and retrieval augmented generation for complex clinical codelists: evaluating performance and assessing failure modes

Matthewman, J.; Denaxas, S.; Langan, S.; Painter, J. L.; Bate, A.

2026-04-24 health informatics 10.64898/2026.04.23.26351098 medRxiv
Top 3%
0.3%
Show abstract

Objectives: Large language models (LLMs) have shown promise in creating clinical codelists for research purposes, a time-consuming task requiring expert domain knowledge. Here, we evaluate the performance and assess failure modes of a retrieval augmented generation (RAG) approach to creating clinical codelists for the large and complex medical terminology used by the Clinical Practice Research Datalink (CPRD). Materials & Methods: We set up a RAG system using a database of word embeddings of the medical terminology that we created using a general-purpose word embedding model (gemini-embedding). We developed 7 reference codelists presenting different challenges and tagged required and optional codes. We ran 168 evaluations (7 codelists, 2 different database subsets, 4 models, 3 epochs each). Scoring was based on the omission of required codes, and inclusion of irrelevant codes. We used model-grading (i.e., grading by another LLM with the reference codelists provided as context) to evaluate the output codelists (a score of 0% being all incorrect and 100% being all correct). Results: We saw varying accuracy across models and codelists, with Gemini 3 Pro (Score 43%) generally performing better than Claude Sonnet 4.6 (36%), Gemini 3 Flash, and OpenAI GPT 5.2 performing worst (14%). Models performed better with shorter target codelists (e.g., Eosinophilic esophagitis with four codes, and Hidradenitis suppurativa with 14 codes). For example, all models consistently failed to produce a complete Wrist fracture codelist (with 214 required codes). We further present evaluation summaries, and failure mode evaluations produced by parsing LLM chat logs. Discussion: Besides demonstrating that a single-shot RAG approach is currently not suitable for codelist generation, we demonstrate failure modes including hallucinations, retrieval failures and generation failures where retrieved codes are not used. Conclusions: Our findings suggest that while RAG systems using current frontier LLMs may create correct clinical codelists in some cases, they still struggle with large and complex terminologies and codelists with a large number of codes. The failure mode we highlight can inform the creation of future workflows to avoid failures.

17
Subtypes of Internalizing and Externalizing Problems in Autistic Preschool Children: Participation in Daily Life and Family Outcomes

Nakamura, T.; Koshio, I.; Nagayama, H.

2026-04-21 psychiatry and clinical psychology 10.64898/2026.04.14.26350723 medRxiv
Top 4%
0.3%
Show abstract

AimAutistic children have a high but varied prevalence of internalizing and externalizing problems. This study aimed to identify the subtypes of internalizing and externalizing problems among autistic preschool children in Japan, examine their temporal stability, and investigate differences in participation in daily life and family outcomes across these subtypes. MethodsA prospective cohort study was conducted with 275 caregivers of autistic children aged 51-75 months. Internalizing and externalizing problems were assessed using the Strengths and Difficulties Questionnaire. ResultsLatent transition analysis identified five subtypes: Low-symptom, High-emotional, Externalizing, Comorbid, and Peer-difficulty groups. Membership in the High-emotional and Externalizing groups was relatively stable over time, whereas the Peer-difficulty group showed frequent transitions to subtypes with higher levels of internalizing or externalizing problems. Significant differences in participation in daily life and family outcomes were observed across subtypes, but these patterns were inconsistent with a simple gradient of symptom levels. ConclusionsThe novel findings that the temporal stability of subtype membership varied and that differences in participation in daily life and family outcomes were observed across the subtypes suggest that the heterogeneity of internalizing and externalizing problems may be associated with variations in childrens participation in daily life and family outcomes over time. Plain Language SummaryAutistic preschool children often experience emotional and behavioral difficulties, but the way these difficulties manifest varies widely across individuals. This study aimed to identify the patterns of these difficulties, examine how they change over time, and investigate how participation in daily life and family outcomes differ across autistic preschool children. We conducted a study with 275 caregivers of autistic children aged 4-6 years in Japan. From caregiver reports of childrens emotional and behavioral difficulties, five distinct patterns were identified: a group with mainly emotional difficulties, a group with mainly behavioral difficulties, a group with both types of difficulties, a group with relatively low levels of difficulties, and a group characterized primarily by peer-related difficulties. Our findings suggest that different patterns of emotional and behavioral difficulties are associated with differences in childrens participation in daily life and family outcomes. These differences could not be explained simply by the overall severity of difficulties but rather reflect distinct patterns based on the type of difficulty. The results indicate that autistic children face diverse difficulties that change over time.

18
A Randomized, Double-blind, Placebo-controlled, Multicenter Clinical Study of Chuanzhi Tongluo Capsule in Acute Ischemic Stroke (CONCERN): Study Rationale and Design

Yang, D.; Li, G.; Song, J.; Shi, X.; Xu, X.; Ma, J.; Guo, C.; Liu, C.; Yang, J.; Li, F.; Zhu, Y.; Zi, W.; Ding, Q.; Chen, Y.

2026-04-23 neurology 10.64898/2026.04.20.26351260 medRxiv
Top 4%
0.3%
Show abstract

Abstract Background: Acute ischemic stroke (AIS) remains a significant cause of disability worldwide. Current treatments, primarily intravenous thrombolysis (IVT), are limited by narrow time windows and reperfusion injury, leading to suboptimal outcomes for many patients. Chuanzhi Tongluo (CZTL), a traditional Chinese medicine, has been preliminarily recognized as a novel cerebral protection agent in animal models. Objectives: This trial investigates the efficacy and safety of CZTL capsule in patients with AIS who are not eligible for IVT or who experience early neurological deterioration after IVT. Methods and design: The CONCERN trial is an investigator-initiated, prospective, multicenter, double-blind, parallel-control, randomized clinical study in China. An estimated 1,208 eligible participants will be consecutively randomized to receive CZTL capsule therapy or placebo in 1:1 ratio across approximately 70 stroke centers in China. All enrolled patients are orally administered 2 capsules of CZTL or placebo 3 times a day together with antiplatelet agents for 3 months. Outcomes: The primary endpoint is an excellent functional outcome, defined as a score of 0 or 1 on the mRS at 90 days. Lead safety endpoints included 90-day mortality and symptomatic intracranial hemorrhage within 48 hours. Conclusions: Results of CONCERN trial will determine the clinical efficacy and safety of the traditional Chinese medicine CZTL capsule in the treatment of AIS patients. Trial registry number: ChiCTR2300074147 (www.chictr.org.cn).

19
CohortContrast: An R Package for Enrichment-Based Identification of Clinically Relevant Concepts in OMOP CDM Data

Haug, M.; Ilves, N.; Umov, N.; Loorents, H.; Suvalov, H.; Tamm, S.; Oja, M.; Reisberg, S.; Vilo, J.; Kolde, R.

2026-04-23 health informatics 10.64898/2026.04.22.26351461 medRxiv
Top 4%
0.3%
Show abstract

Abstract Objective To address the unresolved bottleneck of selecting cohort-relevant clinical concepts for treatment trajectory analysis in observational health data, we introduce CohortContrast, an OMOP-compatible R package for enrichment-based concept identification, temporal and semantic noise reduction, and concept aggregation, enabling cohort-level characterization and downstream trajectory analysis. Materials and Methods We developed CohortContrast and applied it to OMOP-mapped observational data from the Estonian nationwide OPTIMA database, which includes all cases of lung, breast, and prostate cancer, focusing here on lung and prostate cancer cohorts. The workflow combines target-control statistical enrichment, temporal/global noise filtering, hierarchical concept aggregation and correlation-based merging, with optional patient clustering for downstream trajectory exploration. We validated the approach with a clinician-based plausibility assessment of extracted diagnosis-concept pairs and evaluated a large language model (LLM) as an auxiliary filtering step. Results We analyzed 7,579 lung cancer and 11,547 prostate cancer patients. The workflow reduced concept dimensionality from 5,793 to 296 concepts (94.9%) in lung cancer and from 5,759 to 170 concepts (97.0%) in prostate cancer, and identified three exploratory patient subgroups in both cohorts. In a plausibility assessment of 466 diagnosis-concept pairs, validators rated 31.3% as directly linked and 57.5% as indirectly linked. Discussion CohortContrast reduces manual concept curation by prioritizing and aggregating cohort-relevant concepts while preserving clinically interpretable treatment patterns in OMOP-based real-world data. Conclusion CohortContrast enables scalable reduction of broad OMOP concept spaces into clinically interpretable, cohort-specific representations for exploratory trajectory analysis and real-world evidence research.

20
Wavelet analysis reveals non-stationary cardiovascular rhythms associated with delirium and deep sedation in ICU patients

Sreekanth, J.; Salgado-Baez, E.; Edel, A.; Gruenewald, E.; Piper, S. K.; Spies, C.; Balzer, F.; Boie, S. D.

2026-04-23 health informatics 10.64898/2026.04.22.26351455 medRxiv
Top 4%
0.3%
Show abstract

Routine ICU data offers valuable insights into daily physiological rhythms. While traditional methods assume these cycles maintain fixed periods and amplitudes, their inherent variability requires dynamic estimation of instantaneous trends. Wavelet transform effectively resolves circadian oscillations, especially for frequently measured vital parameters. We present novel extensions to the Continuous Wavelet Transform (CWT) power spectral analysis to better detect and segment subtle temporal patterns. Using this approach, we uncover hidden circadian patterns in cardiovascular vitals such as Heart Rate (HR) and Mean Blood Pressure (MBP) measured over five days in a retrospective cohort of 855 ICU patients. By quantifying non-stationary rhythms, we identified diurnal and semi-diurnal oscillations varying in period and power according to delirium and deep sedation. Notably, HR exhibits a clear diurnal and semi-diurnal rhythm when delirium is absent. Overall, our framework supports the CWT as a powerful tool for analyzing complex physiological signals, particularly vital signs. Crucially, our findings suggest that cardiovascular rhythm disruption can be associated with ICU-related delirium and deep sedation.